AITopics | dimensionality reduction technique

Collaborating Authors

dimensionality reduction technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Influence of Data Dimensionality Reduction Methods on the Effectiveness of Quantum Machine Learning Models

Shinde, Aakash Ravindra, Nurminen, Jukka K.

arXiv.org Artificial IntelligenceNov-6-2025

Abstract--Data dimensionality reduction techniques are often utilized in the implementation of Quantum Machine Learning models to address two significant issues: the constraints of NISQ quantum devices, which are characterized by noise and a limited number of qubits, and the challenge of simulating a large number of qubits on classical devices. It also raises concerns over the scalability of these approaches, as dimensionality reduction methods are slow to adapt to large datasets. In this article, we analyze how data reduction methods affect different QML models. We conduct this experiment over several generated datasets, quantum machine algorithms, quantum data encoding methods, and data reduction methods. All these models were evaluated on the performance metrics like accuracy, precision, recall, and F1 score. Our findings have led us to conclude that the usage of data dimensionality reduction methods results in skewed performance metric values, which results in wrongly estimating the actual performance of quantum machine learning models. There are several factors, along with data dimensionality reduction methods, that worsen this problem, such as characteristics of the datasets, classical to quantum information embedding methods, percentage of feature reduction, classical components associated with quantum models, and structure of quantum machine learning models. We consistently observed the difference in the accuracy range of 14% to 48% amongst these models, using data reduction and not using it. Apart from this, our observations have shown that some data reduction methods tend to perform better for some specific data embedding methodologies and ansatz constructions. In recent decades, there has been a significant push towards research and development of Quantum Machine Learning algorithms and models. Quantum Machine Learning has also been heralded as one of the prominent use cases for Quantum Computing devices. Several studies have shown the ability of QML models to solve difficult machine-learning problems and sometimes outperform the classical approach. Mostly, these proofs are either theoretical or simulated on classical devices. This is because the current quantum computational devices lack the required number of qubits, have questionable error correction ability, and tend to have noisy qubits.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.0332

Country: Europe > Finland (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Epistemic Deep Learning: Enabling Machine Learning Models to Know When They Do Not Know

Manchingal, Shireen Kudukkil

arXiv.org Artificial IntelligenceOct-28-2025

Machine learning has achieved remarkable successes, yet its deployment in safety-critical domains remains hindered by an inherent inability to manage uncertainty, resulting in overconfident and unreliable predictions when models encounter out-of-distribution data, adversarial perturbations, or naturally fluctuating environments. This thesis, titled Epistemic Deep Learning: Enabling Machine Learning Models to 'Know When They Do Not Know', addresses these critical challenges by advancing the paradigm of Epistemic Artificial Intelligence, which explicitly models and quantifies epistemic uncertainty: the uncertainty arising from limited, biased, or incomplete training data, as opposed to the irreducible randomness of aleatoric uncertainty, thereby empowering models to acknowledge their limitations and refrain from overconfident decisions when uncertainty is high. Central to this work is the development of the Random-Set Neural Network (RS-NN), a novel methodology that leverages random set theory to predict belief functions over sets of classes, capturing the extent of epistemic uncertainty through the width of associated credal sets, applications of RS-NN, including its adaptation to Large Language Models (LLMs) and its deployment in weather classification for autonomous racing. In addition, the thesis proposes a unified evaluation framework for uncertainty-aware classifiers. Extensive experiments validate that integrating epistemic awareness into deep learning not only mitigates the risks associated with overconfident predictions but also lays the foundation for a paradigm shift in artificial intelligence, where the ability to 'know when it does not know' becomes a hallmark of robust and dependable systems. The title encapsulates the core philosophy of this work, emphasizing that true intelligence involves recognizing and managing the limits of one's own knowledge.

large language model, machine learning, second-order uncertainty representation, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.24384/G4DC-0E02

2510.22261

Country:

Europe (1.00)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (0.92)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Education (0.92)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A Matter of Time: Revealing the Structure of Time in Vision-Language Models

Tekaya, Nidham, Waldner, Manuela, Zeppelzauer, Matthias

arXiv.org Artificial IntelligenceOct-23-2025

Large-scale vision-language models (VLMs) such as CLIP have gained popularity for their generalizable and expressive multimodal representations. By leveraging large-scale training data with diverse textual metadata, VLMs acquire open-vocabulary capabilities, solving tasks beyond their training scope. This paper investigates the temporal awareness of VLMs, assessing their ability to position visual content in time. We introduce TIME10k, a benchmark dataset of over 10,000 images with temporal ground truth, and evaluate the time-awareness of 37 VLMs by a novel methodology. Our investigation reveals that temporal information is structured along a low-dimensional, non-linear manifold in the VLM embedding space. Based on this insight, we propose methods to derive an explicit ``timeline'' representation from the embedding space. These representations model time and its chronological progression and thereby facilitate temporal reasoning tasks. Our timeline approaches achieve competitive to superior accuracy compared to a prompt-based baseline while being computationally efficient. All code and data are available at https://tekayanidham.github.io/timeline-page/.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746027.3758163

2510.19559

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 18:11:34 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors present an efficient method for dimensionality reduction which preserves subspace independence and, as consequence, is useful for problems such as classification where we assume that the data are separable on a lower-dimensional subspace. They show that the subspace structure preservation can be achieved with 2K projects, and they perform a detailed comparison with other methods on benchmark data. The paper is very thorough and well-presented. It presents a novel and effective method for an important problem in dimensionality reduction, and it does so in a way that makes it clear that the method is competitive with popular alternatives.

assumption, dimensionality reduction, subspace, (9 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Summary/Review (0.49)

Technology:

Information Technology > Data Science > Data Mining (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

Dimensionality Reduction with Subspace Structure Preservation

Neural Information Processing SystemsSep-30-2025, 08:28:45 GMT

Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications. However, dimensionality reduction approaches that theoretically preserve this independence assumption have not been well studied. Our key contribution is to show that $2K$ projection vectors are sufficient for the independence preservation of any $K$ class data sampled from a union of independent subspaces. It is this non-trivial observation that we use for designing our dimensionality reduction technique. In this paper, we propose a novel dimensionality reduction algorithm that theoretically preserves this structure for a given dataset. We support our theoretical analysis with empirical results on both synthetic and real world data achieving \textit{state-of-the-art} results compared to popular dimensionality reduction techniques.

dimensionality reduction, name change, subspace structure preservation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Linearized Diffusion Map

Candanedo, Julio

arXiv.org Artificial IntelligenceJul-22-2025

We introduce the Linearized Diffusion Map (LDM), a novel linear dimensionality reduction method constructed via a linear approximation of the diffusion-map kernel. LDM integrates the geometric intuition of diffusion-based nonlinear methods with the computational simplicity, efficiency, and interpretability inherent in linear embeddings such as PCA and classical MDS. Through comprehensive experiments on synthetic datasets (Swiss roll and hyperspheres) and real-world benchmarks (MNIST and COIL-20), we illustrate that LDM captures distinct geometric features of datasets compared to PCA, offering complementary advantages. Specifically, LDM embeddings outperform PCA in datasets exhibiting explicit manifold structures, particularly in high-dimensional regimes, whereas PCA remains preferable in scenarios dominated by variance or noise. Furthermore, the complete positivity of LDM's kernel matrix allows direct applicability of Non-negative Matrix Factorization (NMF), suggesting opportunities for interpretable latent-structure discovery. Our analysis positions LDM as a valuable new linear dimensionality reduction technique with promising theoretical and practical extensions.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.14257

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Improving Clustering on Occupational Text Data through Dimensionality Reduction

García, Iago Xabier Vázquez, Partanaz, Damla, Yetkin, Emrullah Fatih

arXiv.org Artificial IntelligenceJul-11-2025

In this study, we focused on proposing an optimal clustering mechanism for the occupations defined in the well-known US-based occupational database, O*NET. Even though all occupations are defined according to well-conducted surveys in the US, their definitions can vary for different firms and countries. Hence, if one wants to expand the data that is already collected in O*NET for the occupations defined with different tasks, a map between the definitions will be a vital requirement. We proposed a pipeline using several BERT-based techniques with various clustering approaches to obtain such a map. We also examined the effect of dimensionality reduction approaches on several metrics used in measuring performance of clustering algorithms. Finally, we improved our results by using a specialized silhouette approach. This new clustering-based mapping approach with dimensionality reduction may help distinguish the occupations automatically, creating new paths for people wanting to change their careers.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.07582

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(5 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)

Add feedback

Scalable Machine Learning Algorithms using Path Signatures

Tóth, Csaba

arXiv.org Machine LearningJun-26-2025

The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data. This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length. We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.

artificial intelligence, bayesian inference, deep learning, (13 more...)

arXiv.org Machine Learning

2506.17634

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Energy (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Efficient Malware Detection with Optimized Learning on High-Dimensional Features

Choudhary, Aditya, Pawar, Sarthak, Haribhakta, Yashodhara

arXiv.org Artificial IntelligenceJun-24-2025

Malware detection using machine learning requires feature extraction from binary files, as models cannot process raw binaries directly. A common approach involves using LIEF for raw feature extraction and the EMBER vectorizer to generate 2381-dimensional feature vectors. However, the high dimensionality of these features introduces significant computational challenges. This study addresses these challenges by applying two dimensionality reduction techniques: XGBoost-based feature selection and Principal Component Analysis (PCA). We evaluate three reduced feature dimensions (128, 256, and 384), which correspond to approximately 5.4%, 10.8%, and 16.1% of the original 2381 features, across four models-XGBoost, LightGBM, Extra Trees, and Random Forest-using a unified training, validation, and testing split formed from the EMBER-2018, ERMDS, and BODMAS datasets. This approach ensures generalization and avoids dataset bias. Experimental results show that LightGBM trained on the 384-dimensional feature set after XGBoost feature selection achieves the highest accuracy of 97.52% on the unified dataset, providing an optimal balance between computational efficiency and detection performance. The best model, trained in 61 minutes using 30 GB of RAM and 19.5 GB of disk space, generalizes effectively to completely unseen datasets, maintaining 95.31% accuracy on TRITIUM and 93.98% accuracy on INFERNO. These findings present a scalable, compute-efficient approach for malware detection without compromising accuracy.

artificial intelligence, deep learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2506.17309

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Improved Dimensionality Reduction for Inverse Problems in Nuclear Fusion and High-Energy Astrophysics

Gorard, Jonathan, Hakim, Ammar, Qin, Hong, Parfrey, Kyle, Jha, Shantenu

arXiv.org Artificial IntelligenceMay-9-2025

Many inverse problems in nuclear fusion and high-energy astrophysics research, such as the optimization of tokamak reactor geometries or the inference of black hole parameters from interfer-ometric images, necessitate high-dimensional parameter scans and large ensembles of simulations to be performed. Such inverse problems typically involve large uncertainties, both in the measurement parameters being inverted and in the underlying physics models themselves. Monte Carlo sampling, when combined with modern non-linear dimensionality reduction techniques such as autoencoders and manifold learning, can be used to reduce the size of the parameter spaces considerably. However, there is no guarantee that the resulting combinations of parameters will be physically valid, or even mathematically consistent. In this position paper, we advocate adopting a hybrid approach that leverages our recent advances in the development of formal verification methods for numerical algorithms, with the goal of constructing parameter space restrictions with provable mathematical and physical correctness properties, whilst nevertheless respecting both experimental uncertainties and uncertainties in the underlying physical processes.

artificial intelligence, dimensionality reduction technique, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2505.03849

Country: North America > United States > New Jersey > Mercer County > Princeton (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.98)

Add feedback